Knowledge Discovery in Biosequences Using Sort Regular Patterns

نویسندگان

  • Toru Takae
  • Toru Kasai
  • Hiroki Arimura
  • Takeshi Shinohara
چکیده

This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a complex motif in biosequences, and the learning algorithm works well in noisy examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reports in Informatics Approaches to the Automatic Discovery of Patterns in Biosequences

Approaches to the automatic discovery of patterns in biosequences. Abstract This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classiication of pattern languages in this class is developed, covering those patterns which are the most frequently...

متن کامل

Approaches to the Automatic Discovery of Patterns in Biosequences

This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequen...

متن کامل

Reports in Informatics Relation Patterns and Their Automatic Discovery in Biosequences Relation Patterns and Their Automatic Discovery in Biosequences

We have extended the pattern language used in PROSITE to enable it to describe dependencies between amino acid residues. We have developed a minimum description length principle based tness measure evaluating the signiicance of such patterns in relation to a set of sequences, and an algorithm automatically nding signiicant patterns in unaligned sequences. Computing experiments are reported show...

متن کامل

Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences

We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17]...

متن کامل

Pattern Discovery from Biosequences

In this thesis we have developed novel methods for analyzing biological data, the primary sequences of the DNA and proteins, the microarray based gene expression data, and other functional genomics data. The main contribution is the development of the pattern discovery algorithm SPEXS, accompanied by several practical applications for analyzing real biological problems. For performing these bio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007